All Databases MacTech Vol 02-1986

Multi-tasking

Volume Number: 2

Issue Number: 4

Column Tag: Threaded Code: Mach 1

A Multi-tasking Forth System

By Jörg Langowsk, EMBL, C/O I.L.L., Grenoble, Cedex, France,

MacTutor Editorial Board

"Subroutine threaded Forth - the Mach 1 system

This time I am going to introduce to you a new Forth-83 system that I recently

got to review. This system is implemented in a somewhat different way compared with

MacForth, NEON or other Forths, so let me digress a little and make some comments

about Forth implementations on 68000 based systems.

As most of this column's readers know, 'threaded code' means program code that

is organized in a very hierarchical manner; a 'main program' or colon definition

consists of a list of words that describe other lower-level definitions that in turn are

lists of even lower level definitions and so on ... until one arrives at the definitions of

the very lowest level that are directly executable machine code (See Figure 1 below).

How do Forth systems implement this structure? The definition of every Forth

word, colon definition, variable, etc. contains one short piece of executable machine code

at the very beginning. This is called the code field and determines in which way the

information following it is handled. For example, in a colon definition in MacForth, the

code field contains a TRAP #F (V2.0) or TRAP #E (V2.4) instruction that jumps to

code that in turn:

-pushes the instruction pointer (A3) on the return stack,

-sets the instruction pointer to the address of the first word of the new definition to

be executed,

-jumps to the NEXT routine which executes the tokens that follow the code field.

Mac Forth is an example of 'token threaded Forth' which means that the numbers

following the code field are not valid addresses of other Forth words, but 'tokens' that

are to be converted into addresses by some algorithm. In Mac Forth, positive token

values are interpreted as offsets from a base pointer (A4), while negative token values

are negative offsets from the top of application space (A5) into a token table, where the

32-bit address of the Forth word is found. The token table is used to handle programs

that exceed the 32 K address space that can be handled through the first method of

addressing.

The advantage of MacForth's and similar structures is that we need only 16 bits

for each word compiled into a definition. The disadvantage is the limited address space

(32K) for the simple A4-indexed addressing mode and the need for a token table to

convert tokens into addresses that lie outside the 32K addressing region. Time is lost on

each NEXT loop by having to test whether the token is negative or positive, then

eventually by accessing the token table ( See Figure 2 ).

The Forth implementation that was used to create NEON does not need a token

table since each word uses 32 bits, but more space is used in the compiled definitions

that way.

Fig. 3 gives a summary of some possible addressing modes for Forth interpreters

with their execution times (in system clocks, from the MC68000 manual). Some of

these examples have been taken from a Dr. Dobb's Journal article by Joe Barnhart,

'Forth and the Motorola 68000', DDJ 83 (1983) 18-26.

MacForth does not conform to any of the 'pure' definitions here, but (cf. Fig. 2)

is a mixture of a base-offset direct word and base-offset indirect long threaded code

(with an additional branch required).

You see that in all the examples given the inner interpreter requires

considerable overhead to 'set up' the forth words for execution.

Additional overhead is created by the execution of the small piece of machine code

starting at the CFA. In MacForth this is a TRAP instruction (38 cycles) plus three lines

of assembly code that the trap vector points to (32 cycles). Counting everything up,

execution of one colon-defined Forth word requires 100 or 120 cycles under

MacForth, depending whether or not the token table has to be accessed. In a system like

NEON, the CFA does not contain a trap, but a JSR to a routine that sets up the inner

interpreter, with a similar amount of time involved.

One solution around this problem is to code time-critical things in assembly

language, so that the whole definition is made up of executable 68000 code, starting

with the code field. I have given an example for this in MT V1#9 by defining macro

words that compile inline code into a definition. The speedup was of the order of 50% for

MacForth. Of course, if you write inline macros, you are not really creating threaded

code, but linear code, and the size of the definition tends to get rather large.

Still, there is another way to create threaded, i.e. hierarchical, code that does

away with the needs for an 'inner interpreter' loop completely, since the threaded code

will be directly executable 68000 machine language. This sounds strange, but is

actually very simple to achieve. The keyword is 'subroutine threading'. As you saw

from the inline code example, we can do without a 'code field' by just starting to execute

our definition at the very beginnning, provided it is all bona fide 68000 code. Now, all

we have to do is to persuade the Forth system not to compile a list of addresses into a

colon definition, but a list of JSR (addr). This way, we don't need the inner interpreter

at all, Forth instructions are 68000 instructions; the subroutine return stack, in this

case, is the normal 68000 stack, pointed to by A7 (Mach 1 uses still a different stack

Referenced by (4):